Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Optim Lett ; 17(5): 1133-1159, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38516636

RESUMO

The task of projecting onto ℓp norm balls is ubiquitous in statistics and machine learning, yet the availability of actionable algorithms for doing so is largely limited to the special cases of p∈{0, 1, 2, ∞}. In this paper, we introduce novel, scalable methods for projecting onto the ℓp-ball for general p>0. For p≥1, we solve the univariate Lagrangian dual via a dual Newton method. We then carefully design a bisection approach For p<1, presenting theoretical and empirical evidence of zero or a small duality gap in the non-convex case. The success of our contributions is thoroughly assessed empirically, and applied to large-scale regularized multi-task learning and compressed sensing. The code implementing our methods is publicly available on Github.

2.
Stat Sci ; 37(4): 494-518, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37168541

RESUMO

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and ℓ1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC ℓ1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.

3.
SIAM J Optim ; 32(3): 2180-2207, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37200831

RESUMO

This paper studies an optimization problem on the sum of traces of matrix quadratic forms in m semiorthogonal matrices, which can be considered as a generalization of the synchronization of rotations. While the problem is nonconvex, this paper shows that its semidefinite programming relaxation solves the original nonconvex problems exactly with high probability under an additive noise model with small noise in the order of O(m1/4). In addition, it shows that, with high probability, the sufficient condition for global optimality considered in Won, Zhou, and Lange [SIAM J. Matrix Anal. Appl., 2 (2021), pp. 859-882] is also necessary under a similar small noise condition. These results can be considered as a generalization of existing results on phase synchronization.

4.
SIAM J Matrix Anal Appl ; 42(2): 859-882, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34776610

RESUMO

This paper studies the problem of maximizing the sum of traces of matrix quadratic forms on a product of Stiefel manifolds. This orthogonal trace-sum maximization (OTSM) problem generalizes many interesting problems such as generalized canonical correlation analysis (CCA), Procrustes analysis, and cryo-electron microscopy of the Nobel prize fame. For these applications finding global solutions is highly desirable, but it has been unclear how to find even a stationary point, let alone test its global optimality. Through a close inspection of Ky Fan's classical result [Proc. Natl. Acad. Sci. USA, 35 (1949), pp. 652-655] on the variational formulation of the sum of largest eigenvalues of a symmetric matrix, and a semidefinite programming (SDP) relaxation of the latter, we first provide a simple method to certify global optimality of a given stationary point of OTSM. This method only requires testing whether a symmetric matrix is positive semidefinite. A by-product of this analysis is an unexpected strong duality between Shapiro and Botha [SIAM J. Matrix Anal. Appl., 9 (1988), pp. 378-383] and Zhang and Singer [Linear Algebra Appl., 524 (2017), pp. 159-181]. After showing that a popular algorithm for generalized CCA and Procrustes analysis may generate oscillating iterates, we propose a simple fix that provably guarantees convergence to a stationary point. The combination of our algorithm and certificate reveals novel global optima of various instances of OTSM.

5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34254998

RESUMO

Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Análise de Regressão , Software , Biomarcadores , Suscetibilidade a Doenças , Perfilação da Expressão Gênica , Humanos , Mutação , Prognóstico , Modelos de Riscos Proporcionais , Mapeamento de Interação de Proteínas
6.
Front Neurol ; 9: 679, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30271370

RESUMO

Performance of models highly depend not only on the used algorithm but also the data set it was applied to. This makes the comparison of newly developed tools to previously published approaches difficult. Either researchers need to implement others' algorithms first, to establish an adequate benchmark on their data, or a direct comparison of new and old techniques is infeasible. The Ischemic Stroke Lesion Segmentation (ISLES) challenge, which has ran now consecutively for 3 years, aims to address this problem of comparability. ISLES 2016 and 2017 focused on lesion outcome prediction after ischemic stroke: By providing a uniformly pre-processed data set, researchers from all over the world could apply their algorithm directly. A total of nine teams participated in ISLES 2015, and 15 teams participated in ISLES 2016. Their performance was evaluated in a fair and transparent way to identify the state-of-the-art among all submissions. Top ranked teams almost always employed deep learning tools, which were predominately convolutional neural networks (CNNs). Despite the great efforts, lesion outcome prediction persists challenging. The annotated data set remains publicly available and new approaches can be compared directly via the online evaluation system, serving as a continuing benchmark (www.isles-challenge.org).

7.
BMC Bioinformatics ; 19(1): 170, 2018 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-29751737

RESUMO

After publication of the original article [1], it has been found that the author affiliations have been accidentally left out in the PDF. The full affiliations can be found in this correction.

8.
BMC Bioinformatics ; 19(Suppl 1): 44, 2018 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-29504903

RESUMO

BACKGROUND: DNA damage causes aging, cancer, and other serious diseases. The comet assay can detect multiple types of DNA lesions with high sensitivity, and it has been widely applied. Although comet assay platforms have improved the limited throughput and reproducibility of traditional assays in recent times, analyzing large quantities of comet data often requires a tremendous human effort. To overcome this challenge, we proposed HiComet, a computational tool that can rapidly recognize and characterize a large number of comets, using little user intervention. RESULTS: We tested HiComet with real data from 35 high-throughput comet assay experiments, with over 700 comets in total. The proposed method provided unprecedented levels of performance as an automated comet recognition tool in terms of robustness (measured by precision and recall) and throughput. CONCLUSIONS: HiComet is an automated tool for high-throughput comet-assay analysis and could significantly facilitate characterization of individual comets by accelerating its most rate-limiting step. An online implementation of HiComet is freely available at https://github.com/taehoonlee/HiComet/ .


Assuntos
Ensaio Cometa/métodos , Dano ao DNA , Software , Algoritmos , Processamento de Imagem Assistida por Computador
9.
Artigo em Inglês | MEDLINE | ID: mdl-26930691

RESUMO

To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.


Assuntos
Análise por Conglomerados , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , RNA Ribossômico 16S/genética
10.
J R Stat Soc Series B Stat Methodol ; 75(3): 427-450, 2013 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-23730197

RESUMO

Estimation of high-dimensional covariance matrices is known to be a difficult problem, has many applications, and is of current interest to the larger statistics community. In many applications including so-called the "large p small n" setting, the estimate of the covariance matrix is required to be not only invertible, but also well-conditioned. Although many regularization schemes attempt to do this, none of them address the ill-conditioning problem directly. In this paper, we propose a maximum likelihood approach, with the direct goal of obtaining a well-conditioned estimator. No sparsity assumption on either the covariance matrix or its inverse are are imposed, thus making our procedure more widely applicable. We demonstrate that the proposed regularization scheme is computationally efficient, yields a type of Steinian shrinkage estimator, and has a natural Bayesian interpretation. We investigate the theoretical properties of the regularized covariance estimator comprehensively, including its regularization path, and proceed to develop an approach that adaptively determines the level of regularization that is required. Finally, we demonstrate the performance of the regularized estimator in decision-theoretic comparisons and in the financial portfolio optimization setting. The proposed approach has desirable properties, and can serve as a competitive procedure, especially when the sample size is small and when a well-conditioned estimator is required.

11.
IEEE Trans Biomed Eng ; 60(1): 240-4, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22922688

RESUMO

Images captured using computed tomography and magnetic resonance angiography are used in the examination of the abdominal aorta and its branches. The examination of all clinically relevant branches simultaneously in a single 2-D image without any misleading overlaps facilitates the diagnosis of vascular abnormalities. This problem is called uncluttered single-image visualization (USIV). We can solve the USIV problem by assigning energy-based scores to visualization candidates and then finding the candidate that optimizes the score; this approach is similar to the manner in which the protein side-chain placement problem has been solved. To obtain near-optimum images, we need to explore the energy space extensively, which is often time consuming. This paper describes a method for exploring the energy space in a massively parallel fashion using graphics processing units. According to our experiments, in which we used 30 images obtained from five patients, the proposed method can reduce the total visualization time substantially. We believe that the proposed method can make a significant contribution to the effective visualization of abdominal vascular structures and precise diagnosis of related abnormalities.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Angiografia por Ressonância Magnética/métodos , Modelos Cardiovasculares , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Aorta Abdominal/anatomia & histologia , Aorta Abdominal/diagnóstico por imagem , Humanos
12.
IEEE Trans Vis Comput Graph ; 19(1): 81-93, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22291148

RESUMO

Direct projection of 3D branching structures, such as networks of cables, blood vessels, or neurons onto a 2D image creates the illusion of intersecting structural parts and creates challenges for understanding and communication. We present a method for visualizing such structures, and demonstrate its utility in visualizing the abdominal aorta and its branches, whose tomographic images might be obtained by computed tomography or magnetic resonance angiography, in a single 2D stylistic image, without overlaps among branches. The visualization method, termed uncluttered single-image visualization (USIV), involves optimization of geometry. This paper proposes a novel optimization technique that utilizes an interesting connection of the optimization problem regarding USIV to the protein structure prediction problem. Adopting the integer linear programming-based formulation for the protein structure prediction problem, we tested the proposed technique using 30 visualizations produced from five patient scans with representative anatomical variants in the abdominal aortic vessel tree. The novel technique can exploit commodity-level parallelism, enabling use of general-purpose graphics processing unit (GPGPU) technology that yields a significant speedup. Comparison of the results with the other optimization technique previously reported elsewhere suggests that, in most aspects, the quality of the visualization is comparable to that of the previous one, with a significant gain in the computation time of the algorithm.


Assuntos
Aorta Abdominal/anatomia & histologia , Gráficos por Computador , Aumento da Imagem/métodos , Imageamento Tridimensional/métodos , Processamento de Sinais Assistido por Computador , Interface Usuário-Computador , Algoritmos , Angiografia/métodos , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Análise Numérica Assistida por Computador , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
Proc Natl Acad Sci U S A ; 109(8): 2848-53, 2012 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-22323610

RESUMO

Highly multiplexed assays using antibody coated, fluorescent (xMap) beads are widely used to measure quantities of soluble analytes, such as cytokines and antibodies in clinical and other studies. Current analyses of these assays use methods based on standard curves that have limitations in detecting low or high abundance analytes. Here we describe SAxCyB (Significance Analysis of xMap Cytokine Beads), a method that uses fluorescence measurements of individual beads to find significant differences between experimental conditions. We show that SAxCyB outperforms conventional analysis schemes in both sensitivity (low fluorescence) and robustness (high variability) and has enabled us to find many new differentially expressed cytokines in published studies.


Assuntos
Citocinas/análise , Microesferas , Modelos Estatísticos , Análise Serial de Proteínas/métodos , Animais , Citocinas/sangue , Francisella tularensis/fisiologia , Humanos , Camundongos , Modelos Biológicos , Tularemia/sangue , Tularemia/imunologia
14.
PLoS One ; 6(11): e27891, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22140480

RESUMO

Though recently they have fallen into some disrepute, genome-wide association studies (GWAS) have been formulated and applied to understanding essential hypertension. The principal goal here is to use data gathered in a GWAS to gauge the extent to which SNPs and their interactions with other features can be combined to predict mean arterial blood pressure (MAP) in 3138 pre-menopausal and naturally post-menopausal white women. More precisely, we quantify the extent to which data as described permit prediction of MAP beyond what is possible from traditional risk factors such as blood cholesterol levels and glucose levels. Of course, these traditional risk factors are genetic, though typically not explicitly so. In all, there were 44 such risk factors/clinical variables measured and 377,790 single nucleotide polymorphisms (SNPs) genotyped. Data for women we studied are from first visit measurements taken as part of the Atherosclerotic Risk in Communities (ARIC) study. We begin by assessing non-SNP features in their abilities to predict MAP, employing a novel regression technique with two stages, first the discovery of main effects and next discovery of their interactions. The long list of SNPs genotyped is reduced to a manageable list for combining with non-SNP features in prediction. We adapted Efron's local false discovery rate to produce this reduced list. Selected non-SNP and SNP features and their interactions are used to predict MAP using adaptive linear regression. We quantify quality of prediction by an estimated coefficient of determination (R(2)). We compare the accuracy of prediction with and without information from SNPs.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Hipertensão/genética , Polimorfismo de Nucleotídeo Único/genética , Idoso , Algoritmos , Feminino , Estudos de Associação Genética , Humanos , Pessoa de Meia-Idade , Fenótipo
15.
Med Phys ; 36(11): 5245-60, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19994535

RESUMO

PURPOSE: The authors develop a method to visualize the abdominal aorta and its branches, obtained by CT or MR angiography, in a single 2D stylistic image without overlap among branches. METHODS: The abdominal aortic vasculature is modeled as an articulated object whose underlying topology is a rooted tree. The inputs to the algorithm are the 3D centerlines of the abdominal aorta, its branches, and their associated diameter information. The visualization problem is formulated as an optimization problem that finds a spatial configuration of the bounding boxes of the centerlines most similar to the projection of the input into a given viewing direction (e.g., anteroposterior), while not introducing intersections among the boxes. The optimization algorithm minimizes a score function regarding the overlap of the bounding boxes and the deviation from the input. The output of the algorithm is used to produce a stylistic visualization, made of the 2D centerlines modulated by the associated diameter information, on a plane. The authors performed a preliminary evaluation by asking three radiologists to label 366 arterial branches from the 30 visualizations of five cases produced by the method. Each of the five patients was presented in six different variant images, selected from ten variants with the three lowest and three highest scores. For each label, they assigned confidence and distortion ratings (low/medium/high). They studied the association between the quantitative metrics measured from the visualization and the subjective ratings by the radiologists. RESULTS: All resulting visualizations were free from branch overlaps. Labeling accuracies of the three readers were 93.4%, 94.5%, and 95.4%, respectively. For the total of 1098 samples, the distortion ratings were low: 77.39%, medium: 10.48%, and high: 12.12%. The confidence ratings were low: 5.56%, medium: 16.50%, and high: 77.94%. The association study shows that the proposed quantitative metrics can predict a reader's subjective ratings and suggests that the visualization with the lowest score should be selected for readers. CONCLUSIONS: The method for eliminating misleading false intersections in 2D projections of the abdominal aortic tree conserves the overall shape and does not diminish accurate identifiability of the branches.


Assuntos
Algoritmos , Angiografia/métodos , Aorta Abdominal/anatomia & histologia , Aorta Abdominal/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Angiografia por Ressonância Magnética/métodos , Tomografia Computadorizada por Raios X/métodos , Humanos , Modelos Anatômicos , Fatores de Tempo
16.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 3345-8, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17946176

RESUMO

We developed a novel visualization method for providing an uncluttered view of the abdominal aorta and its branches. The method abstracts the complex geometry of vessels using a convex primitive, and uses a sweep line algorithm to find a suboptimal placement of the primitive. The method was evaluated using 10 CT angiography datasets and resulted in a clear visualization with all cluttering intersections removed. The method can be used to convey clinical findings, including lumen patency and lesion locations, in a single two-dimensional image.


Assuntos
Aorta Abdominal/diagnóstico por imagem , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Aorta Abdominal/anatomia & histologia , Engenharia Biomédica , Bases de Dados Factuais , Feminino , Humanos , Imageamento por Ressonância Magnética/estatística & dados numéricos , Masculino , Pessoa de Meia-Idade , Modelos Anatômicos , Modelos Cardiovasculares , Interpretação de Imagem Radiográfica Assistida por Computador , Tomografia Computadorizada por Raios X/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...